Economic Time Series Forecasting

Este post explora um teste com o pacote modeltime para forecasting de séries temporais combinando algoritmos de machine learning.


Anunciai-nos as coisas que ainda hão de vir, para que saibamos que sois deuses; ou fazei bem, ou fazei mal, para que nos assombremos, e juntamente o vejamos. Isaías 41:23




\[\\[1in]\]


Carrego as bibliotecas necessárias

library(xgboost)
library(earth)
library(tidymodels)
library(modeltime)
library(modeltime.ensemble)
library(tidyverse)
library(lubridate)
library(timetk)
library(quantmod)
library(tsibble)
library(tsibbledata)
library(dplyr)
library(tidyverse)
library(data.table)
library(tidyr)

Baixo os dados e separo amostra de treino

Utilizarei novamente os mesmos dados das cotações do milho CORN negociados na bolsa de Chicago:

CORN <- getSymbols("CORN", auto.assign = FALSE,
                   from = "1994-01-01", end = Sys.Date())
glimpse(CORN)
An 'xts' object on 2010-06-09/2021-06-18 containing:
  Data: num [1:2777, 1:6] 25.1 25.5 25.9 26 26.2 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:6] "CORN.Open" "CORN.High" "CORN.Low" "CORN.Close" ...
  Indexed by objects of class: [Date] TZ: UTC
  xts Attributes:  
List of 2
 $ src    : chr "yahoo"
 $ updated: POSIXct[1:1], format: "2021-06-20 14:24:05"
periodicity(CORN)
Daily periodicity from 2010-06-09 to 2021-06-18 
CORN <- CORN %>%
  as.data.table() %>%
  as_tibble()

class(CORN)
[1] "tbl_df"     "tbl"        "data.frame"
glimpse(CORN)
Rows: 2,777
Columns: 7
$ index         <date> 2010-06-09, 2010-06-10, 2010-06-11, 2010-06-14, 2010-06~
$ CORN.Open     <dbl> 25.12, 25.46, 25.88, 25.99, 26.24, 26.26, 26.20, 26.22, ~
$ CORN.High     <dbl> 25.25, 25.46, 25.88, 26.11, 26.24, 26.44, 26.20, 26.60, ~
$ CORN.Low      <dbl> 25.12, 25.46, 25.79, 25.99, 25.97, 26.20, 25.82, 26.22, ~
$ CORN.Close    <dbl> 25.15, 25.46, 25.79, 26.11, 25.97, 26.32, 26.08, 26.39, ~
$ CORN.Volume   <dbl> 1700, 200, 500, 2200, 7000, 2400, 1600, 3100, 9500, 2870~
$ CORN.Adjusted <dbl> 25.15, 25.46, 25.79, 26.11, 25.97, 26.32, 26.08, 26.39, ~
CORN %>%
  plot_time_series(index, CORN.Close, .interactive = FALSE)

Separa a série temporal numa proporção 70/30

splits <- initial_time_split(CORN, prop = 0.8)

Estimando os modelos univariados contra o tempo

Os modelos aqui testados são 14, sendo que eles contemplam:

  • Família ARIMA e com erros XGBoost;

  • Ajuste/Alisamento Exponencial e com ajustes sazonais e também modelo Croston e Theta

  • TBATS;

  • prophet e prophet_xgboost

  • Naive e SNaive

  • Rede Neural Artificial com componente Auto Regressivo

# Modelo 1: auto_arima ----
model_fit_arima_no_boost <- arima_reg() %>%
  set_engine(engine = "auto_arima") %>%
  fit(CORN.Close ~  index, 
      data = training(splits))
# Modelo 2: arima_boost ----
model_fit_arima_boosted <- arima_boost(
  min_n = 2,
  learn_rate = 0.9
) %>%
  set_engine(engine = "auto_arima_xgboost") %>%
  fit(CORN.Close ~ index,
      data = training(splits))
# Modelo 3: ets ----
model_fit_ets <- exp_smoothing() %>%
  set_engine(engine = "ets") %>%
  fit(CORN.Close ~  index, 
      data = training(splits))
# Modelo 4: tbats ----
model_fit_tbats <- seasonal_reg() %>%
  set_engine(engine = "tbats") %>%
  fit(CORN.Close ~ index,
      data = training(splits))
# Modelo 5: prophet xgb ----
model_fit_prophet_xb <- prophet_boost(
  learn_rate = 0.8 ) %>%
  set_engine("prophet_xgboost") %>%
  fit(CORN.Close ~  index,
      data = training(splits))
# Modelo 6: stlm_ets  ----
model_fit_stlm_ets <- seasonal_reg() %>%
  set_engine("stlm_ets") %>%
  fit(CORN.Close ~  index,
      data = training(splits))
# Modelo 7: stlm arima  ----
model_fit_stlm_arima <- seasonal_reg() %>%
  set_engine("stlm_arima") %>%
  fit(CORN.Close ~  index,
      data = training(splits))
# Modelo 8: arima xgb padrao  ----
model_fit_arima_padrao <-arima_boost(
                        # Padroes p, d, q
                        seasonal_period = 7,
                        non_seasonal_ar = 5,
                        non_seasonal_differences = 1,
                        non_seasonal_ma = 3,
                        seasonal_ar = 1,
                        seasonal_differences = 0,
                        seasonal_ma = 1,
                        # XGBoost 
                        tree_depth = 6,
                        learn_rate = 0.8
) %>%
  set_engine("arima_xgboost") %>%
  fit(CORN.Close ~  index,
      data = training(splits))
# Modelo 9: ets padrao  ----
model_fit_ets_padrao <- exp_smoothing(
                        seasonal_period = 7, # padrao de 7 dias de trade
                        error = "multiplicative",
                        trend = "additive",
                        season = "multiplicative"
) %>%
  set_engine("ets") %>%
  fit(CORN.Close ~  index,
      data = training(splits))
# Modelo 10: ets croston  ----
model_fit_ets_croston <- exp_smoothing(
  smooth_level = 0.2
) %>%
  set_engine("croston") %>%
  fit(CORN.Close ~  index,
      data = training(splits))
# Modelo 11: ets theta  ----
model_fit_ets_theta <- exp_smoothing(
) %>%
  set_engine("theta") %>%
  fit(CORN.Close ~  index,
      data = training(splits))
# Modelo 12: naive  ----
model_fit_naive <- naive_reg(
) %>%
  set_engine("naive") %>%
  fit(CORN.Close ~  index,
      data = training(splits))
# Modelo 13: snaive  ----
model_fit_snaive <- naive_reg(
                      seasonal_period = 7
) %>%
  set_engine("snaive") %>%
  fit(CORN.Close ~  index,
      data = training(splits))
# Modelo 14: rede neural autoregressiva  ----
model_fit_nnetar <-  nnetar_reg(
) %>%
set_engine("nnetar")

set.seed(123) 
model_fit_nnetar <- model_fit_nnetar %>%
  fit(CORN.Close ~  index,
      data = training(splits))

Comparativo de perfomance preditiva

Avalio o ajuste de cada método para a mesma amostra da série temporal:

models_tbl <- modeltime_table(

  # -- Modelo --                           -- Numero --
  
  model_fit_arima_no_boost,                 # Modelo 1
  model_fit_arima_boosted,                  # Modelo 2
  model_fit_ets,                            # Modelo 3
  model_fit_tbats,                          # Modelo 4
  model_fit_prophet_xb,                     # Modelo 5
  model_fit_stlm_ets,                       # Modelo 6
  model_fit_stlm_arima,                     # Modelo 7
  model_fit_arima_padrao,                   # Modelo 8
  model_fit_ets_padrao,                     # Modelo 9
  model_fit_ets_croston,                    # Modelo 10
  model_fit_ets_theta,                      # Modelo 11
  model_fit_naive,                          # Modelo 12
  model_fit_snaive,                         # Modelo 13
  model_fit_nnetar                          # Modelo 14

  ) 

models_tbl
# Modeltime Table
# A tibble: 14 x 3
   .model_id .model   .model_desc                  
       <int> <list>   <chr>                        
 1         1 <fit[+]> ARIMA(2,1,0)                 
 2         2 <fit[+]> ARIMA(0,1,2)                 
 3         3 <fit[+]> ETS(M,A,N)                   
 4         4 <fit[+]> BATS(0, {0,0}, -, -)         
 5         5 <fit[+]> PROPHET                      
 6         6 <fit[+]> SEASONAL DECOMP: ETS(M,A,N)  
 7         7 <fit[+]> SEASONAL DECOMP: ARIMA(0,1,0)
 8         8 <fit[+]> ARIMA(5,1,3)(1,0,1)[7]       
 9         9 <fit[+]> ETS(M,AD,M)                  
10        10 <fit[+]> CROSTON METHOD               
11        11 <fit[+]> THETA METHOD                 
12        12 <fit[+]> NAIVE                        
13        13 <fit[+]> SNAIVE [7]                   
14        14 <fit[+]> NNAR(1,1,10)[5]              

Compara a acurácia

models_tbl %>%
  modeltime_calibrate(new_data = testing(splits)) %>%
  modeltime_accuracy(
  metric_set = metric_set(
    mape,
    smape,
    mae, 
    rmse,
    rsq # R^2
    )
)
# A tibble: 14 x 8
   .model_id .model_desc                .type  mape smape   mae  rmse        rsq
       <int> <chr>                      <chr> <dbl> <dbl> <dbl> <dbl>      <dbl>
 1         1 ARIMA(2,1,0)               Test   13.4  12.8  1.95  2.43    2.62e-5
 2         2 ARIMA(0,1,2)               Test   13.4  12.8  1.95  2.43    5.07e-5
 3         3 ETS(M,A,N)                 Test   12.5  14.4  2.12  3.43    7.60e-2
 4         4 BATS(0, {0,0}, -, -)       Test   13.4  12.8  1.95  2.43   NA      
 5         5 PROPHET                    Test   12.4  13.4  2.01  2.94    3.44e-2
 6         6 SEASONAL DECOMP: ETS(M,A,~ Test   12.6  14.5  2.13  3.45    7.60e-2
 7         7 SEASONAL DECOMP: ARIMA(0,~ Test   13.3  12.7  1.94  2.43    3.34e-5
 8         8 ARIMA(5,1,3)(1,0,1)[7]     Test   13.4  12.8  1.95  2.43    9.26e-5
 9         9 ETS(M,AD,M)                Test   13.2  12.7  1.94  2.42    1.36e-3
10        10 CROSTON METHOD             Test   13.5  12.9  1.96  2.44   NA      
11        11 THETA METHOD               Test   12.0  13.6  2.01  3.25    7.60e-2
12        12 NAIVE                      Test   13.4  12.8  1.95  2.43   NA      
13        13 SNAIVE [7]                 Test   13.4  12.8  1.95  2.44    2.93e-6
14        14 NNAR(1,1,10)[5]            Test   17.2  15.5  2.39  2.89    2.14e-3

Veremos o modelo prophet p. ex.:

list(model_fit_prophet_xb) %>%
    as_modeltime_table()
# Modeltime Table
# A tibble: 1 x 3
  .model_id .model   .model_desc
      <int> <list>   <chr>      
1         1 <fit[+]> PROPHET    

Calibração dos modelos

calibration_tbl <- models_tbl %>%
  modeltime_calibrate(new_data = testing(splits), quiet = FALSE)

calibration_tbl
# Modeltime Table
# A tibble: 14 x 5
   .model_id .model   .model_desc                   .type .calibration_data 
       <int> <list>   <chr>                         <chr> <list>            
 1         1 <fit[+]> ARIMA(2,1,0)                  Test  <tibble [556 x 4]>
 2         2 <fit[+]> ARIMA(0,1,2)                  Test  <tibble [556 x 4]>
 3         3 <fit[+]> ETS(M,A,N)                    Test  <tibble [556 x 4]>
 4         4 <fit[+]> BATS(0, {0,0}, -, -)          Test  <tibble [556 x 4]>
 5         5 <fit[+]> PROPHET                       Test  <tibble [556 x 4]>
 6         6 <fit[+]> SEASONAL DECOMP: ETS(M,A,N)   Test  <tibble [556 x 4]>
 7         7 <fit[+]> SEASONAL DECOMP: ARIMA(0,1,0) Test  <tibble [556 x 4]>
 8         8 <fit[+]> ARIMA(5,1,3)(1,0,1)[7]        Test  <tibble [556 x 4]>
 9         9 <fit[+]> ETS(M,AD,M)                   Test  <tibble [556 x 4]>
10        10 <fit[+]> CROSTON METHOD                Test  <tibble [556 x 4]>
11        11 <fit[+]> THETA METHOD                  Test  <tibble [556 x 4]>
12        12 <fit[+]> NAIVE                         Test  <tibble [556 x 4]>
13        13 <fit[+]> SNAIVE [7]                    Test  <tibble [556 x 4]>
14        14 <fit[+]> NNAR(1,1,10)[5]               Test  <tibble [556 x 4]>

Projeção (forecasting)

calibration_tbl %>%
    modeltime_forecast(
        new_data    = testing(splits),
        actual_data = CORN
    ) %>%
    plot_modeltime_forecast(
      .legend_max_width = 25, # For mobile screens
      .interactive      = TRUE
    )

Modelo combinado de projeção

O modelo do tipo ensemble é um modelo misto com os pesos ajustados pela acurácia de cada um dos 14 univariados estimados até aqui

# Modelo 15: ensemble  ----
model_fit_ensemble <- models_tbl %>%
  ensemble_weighted(
    loadings = c
    (
      
#-- Peso --       # -- Modelo --
      
     1,           # Modelo 1
     2,           # Modelo 2
     5,           # Modelo 3
     6,           # Modelo 4
     10,          # Modelo 5
     9,           # Modelo 6
     8,           # Modelo 7 
     1,           # Modelo 8
     7,           # Modelo 9
     2,           # Modelo 10  
     2,           # Modelo 11 
     2,           # Modelo 12 
     1,           # Modelo 13 
     10           # Modelo 14 
      
      ),
    scale_loadings = TRUE
  )

model_fit_ensemble
-- Modeltime Ensemble -------------------------------------------
Ensemble of 14 Models (WEIGHTED)

# Modeltime Table
# A tibble: 14 x 4
   .model_id .model   .model_desc                   .loadings
       <int> <list>   <chr>                             <dbl>
 1         1 <fit[+]> ARIMA(2,1,0)                     0.0152
 2         2 <fit[+]> ARIMA(0,1,2)                     0.0303
 3         3 <fit[+]> ETS(M,A,N)                       0.0758
 4         4 <fit[+]> BATS(0, {0,0}, -, -)             0.0909
 5         5 <fit[+]> PROPHET                          0.152 
 6         6 <fit[+]> SEASONAL DECOMP: ETS(M,A,N)      0.136 
 7         7 <fit[+]> SEASONAL DECOMP: ARIMA(0,1,0)    0.121 
 8         8 <fit[+]> ARIMA(5,1,3)(1,0,1)[7]           0.0152
 9         9 <fit[+]> ETS(M,AD,M)                      0.106 
10        10 <fit[+]> CROSTON METHOD                   0.0303
11        11 <fit[+]> THETA METHOD                     0.0303
12        12 <fit[+]> NAIVE                            0.0303
13        13 <fit[+]> SNAIVE [7]                       0.0152
14        14 <fit[+]> NNAR(1,1,10)[5]                  0.152 

Construimos a projeção com o modelo misto:

# Calibragem

calibration_tbl <- modeltime_table(
    model_fit_ensemble
) %>%
    modeltime_calibrate(testing(splits), quiet = FALSE)

# Forecast vs amostra de teste (dados originais pra frente antes do corte pra amostra treino)

calibration_tbl %>%
    modeltime_forecast(
        new_data    = testing(splits),
        actual_data = CORN
    ) %>%
    plot_modeltime_forecast(.interactive = TRUE)

Reamostragens

A reamostragem de séries temporais é uma estratégia importante para avaliar a estabilidade dos modelos ao longo do tempo. No entanto, é um tanto quanto penoso fazer isso, pois requer vários loops for para gerar as previsões para vários modelos e, potencialmente, vários grupos de séries temporais.

resamples_tscv <- training(splits) %>%
    time_series_cv(
                  assess = "2 years",
                  initial = "5 years",
                  skip = "1 years",
# Normally we do more than one slice, but this speeds up the example
slice_limit = 1
)
# Fit and generate resample predictions
model_fit_ensemble_resample <- calibration_tbl %>%
modeltime_fit_resamples(
            resamples = resamples_tscv,
            control = control_resamples(verbose = TRUE)
)
-- Fitting Resamples --------------------------------------------

0.22 sec elapsed
# A new data frame is created from the Modeltime Table
# with a column labeled, '.resample_results'
model_fit_ensemble_resample
# Modeltime Table
# A tibble: 1 x 6
  .model_id .model    .model_desc        .type .calibration_da~ .resample_resul~
      <int> <list>    <chr>              <chr> <list>           <list>          
1         1 <ensembl~ ENSEMBLE (WEIGHTE~ Test  <tibble [556 x ~ <lgl [1]>       

Plota o gráfico

model_fit_ensemble_resample %>%
  plot_modeltime_resamples(.interactive = TRUE)

Revisão das projeções (forecast forward)

Constrói a projeção para fora da amostra desde a última observação para os próximos 24 meses:

refit_tbl <- calibration_tbl %>%
    modeltime_refit(data = CORN)

refit_tbl %>%
    modeltime_forecast(h = "2 years", actual_data = CORN) %>%
    plot_modeltime_forecast(
      .legend_max_width = 25, # For mobile screens
      .interactive      = TRUE
    )

Estimativa dos modelos bivariados do tipo preço x volume (fluxo de oferta e demanda)

Na forma bivariada inserimos o volume de negociação como variável explicativa, ainda mesmo que esperando uma performance estatística inferior, sabemos que pelo princípio econômico, a volatilidade é facilmente capturada ao observarmos os picos de volume negociados diariamente.

# Modelo 1: auto_arima ----
model_fit_arima_no_boost <- arima_reg() %>%
  set_engine(engine = "auto_arima") %>%
  fit(CORN.Close ~ CORN.Volume + index, 
      data = training(splits))
# Modelo 2: arima_boost ----
model_fit_arima_boosted <- arima_boost(
  min_n = 2,
  learn_rate = 0.9
) %>%
  set_engine(engine = "auto_arima_xgboost") %>%
  fit(CORN.Close ~ CORN.Volume + index,
      data = training(splits))
# Modelo 3: ets ----
model_fit_ets <- exp_smoothing() %>%
  set_engine(engine = "ets") %>%
  fit(CORN.Close ~ CORN.Volume + index, 
      data = training(splits))
# Modelo 4: tbats ----
model_fit_tbats <- seasonal_reg() %>%
  set_engine(engine = "tbats") %>%
  fit(CORN.Close ~ CORN.Volume + index,
      data = training(splits))
# Modelo 5: prophet xgb ----
model_fit_prophet_xb <- prophet_boost(
  learn_rate = 0.8 ) %>%
  set_engine("prophet_xgboost") %>%
  fit(CORN.Close ~  CORN.Volume + index,
      data = training(splits))
# Modelo 6: stlm_ets  ----
model_fit_stlm_ets <- seasonal_reg() %>%
  set_engine("stlm_ets") %>%
  fit(CORN.Close ~  CORN.Volume + index,
      data = training(splits))
# Modelo 7: stlm arima  ----
model_fit_stlm_arima <- seasonal_reg() %>%
  set_engine("stlm_arima") %>%
  fit(CORN.Close ~  CORN.Volume + index,
      data = training(splits))
# Modelo 8: arima xgb padrao  ----
model_fit_arima_padrao <-arima_boost(
                        # Padroes p, d, q
                        seasonal_period = 7,
                        non_seasonal_ar = 5,
                        non_seasonal_differences = 1,
                        non_seasonal_ma = 3,
                        seasonal_ar = 1,
                        seasonal_differences = 0,
                        seasonal_ma = 1,
                        # XGBoost 
                        tree_depth = 6,
                        learn_rate = 0.8
) %>%
  set_engine("arima_xgboost") %>%
  fit(CORN.Close ~  CORN.Volume + index,
      data = training(splits))
# Modelo 9: ets padrao  ----
model_fit_ets_padrao <- exp_smoothing(
                        seasonal_period = 7, # padrao de 7 dias de trade
                        error = "multiplicative",
                        trend = "additive",
                        season = "multiplicative"
) %>%
  set_engine("ets") %>%
  fit(CORN.Close ~  CORN.Volume + index,
      data = training(splits))
# Modelo 10: ets croston  ----
model_fit_ets_croston <- exp_smoothing(
  smooth_level = 0.2
) %>%
  set_engine("croston") %>%
  fit(CORN.Close ~ CORN.Volume + index,
      data = training(splits))
# Modelo 11: ets theta  ----
model_fit_ets_theta <- exp_smoothing(
) %>%
  set_engine("theta") %>%
  fit(CORN.Close ~ CORN.Volume + index,
      data = training(splits))
# Modelo 12: naive  ----
model_fit_naive <- naive_reg(
) %>%
  set_engine("naive") %>%
  fit(CORN.Close ~ CORN.Volume + index,
      data = training(splits))
# Modelo 13: snaive  ----
model_fit_snaive <- naive_reg(
                      seasonal_period = 7
) %>%
  set_engine("snaive") %>%
  fit(CORN.Close ~  CORN.Volume + index,
      data = training(splits))
# Modelo 14: rede neural autoregressiva  ----
model_fit_nnetar <-  nnetar_reg(
) %>%
set_engine("nnetar")

set.seed(123) 
model_fit_nnetar <- model_fit_nnetar %>%
  fit(CORN.Close ~  CORN.Volume + index,
      data = training(splits))

Comparativo de perfomance preditiva (bivariado)

Avalio o ajuste de cada método para a mesma amostra da série temporal:

models_tbl_bivariada <- modeltime_table(

  # -- Modelo --                           -- Numero --
  
  model_fit_arima_no_boost,                 # Modelo 1
  model_fit_arima_boosted,                  # Modelo 2
  model_fit_ets,                            # Modelo 3
  model_fit_tbats,                          # Modelo 4
  model_fit_prophet_xb,                     # Modelo 5
  model_fit_stlm_ets,                       # Modelo 6
  model_fit_stlm_arima,                     # Modelo 7
  model_fit_arima_padrao,                   # Modelo 8
  model_fit_ets_padrao,                     # Modelo 9
  model_fit_ets_croston,                    # Modelo 10
  model_fit_ets_theta,                      # Modelo 11
  model_fit_naive,                          # Modelo 12
  model_fit_snaive,                         # Modelo 13
  model_fit_nnetar                          # Modelo 14

  ) 

models_tbl_bivariada
# Modeltime Table
# A tibble: 14 x 3
   .model_id .model   .model_desc                                         
       <int> <list>   <chr>                                               
 1         1 <fit[+]> REGRESSION WITH ARIMA(2,1,0) ERRORS                 
 2         2 <fit[+]> ARIMA(0,1,2) W/ XGBOOST ERRORS                      
 3         3 <fit[+]> ETS(M,A,N)                                          
 4         4 <fit[+]> BATS(0, {0,0}, -, -)                                
 5         5 <fit[+]> PROPHET                                             
 6         6 <fit[+]> SEASONAL DECOMP: ETS(M,A,N)                         
 7         7 <fit[+]> SEASONAL DECOMP: REGRESSION WITH ARIMA(0,1,0) ERRORS
 8         8 <fit[+]> ARIMA(5,1,3)(1,0,1)[7] W/ XGBOOST ERRORS            
 9         9 <fit[+]> ETS(M,AD,M)                                         
10        10 <fit[+]> CROSTON METHOD                                      
11        11 <fit[+]> THETA METHOD                                        
12        12 <fit[+]> NAIVE                                               
13        13 <fit[+]> SNAIVE [7]                                          
14        14 <fit[+]> NNAR(1,1,10)[5]                                     

Compara a acurácia (bivariado)

Tabela de avaliação da acurácia dos modelos bivariados no tempo:

models_tbl_bivariada %>%
  modeltime_calibrate(new_data = testing(splits)) %>%
  modeltime_accuracy(
  metric_set = metric_set(
    mape,
    smape,
    mae, 
    rmse,
    rsq # R^2
    )
)
# A tibble: 14 x 8
   .model_id .model_desc                 .type  mape smape   mae  rmse       rsq
       <int> <chr>                       <chr> <dbl> <dbl> <dbl> <dbl>     <dbl>
 1         1 REGRESSION WITH ARIMA(2,1,~ Test   11.2  12.0  1.82  2.79   7.92e-2
 2         2 ARIMA(0,1,2) W/ XGBOOST ER~ Test   13.5  12.9  1.97  2.49   2.33e-3
 3         3 ETS(M,A,N)                  Test   12.5  14.4  2.12  3.43   7.60e-2
 4         4 BATS(0, {0,0}, -, -)        Test   13.4  12.8  1.95  2.43  NA      
 5         5 PROPHET                     Test   12.4  13.4  2.01  2.94   3.44e-2
 6         6 SEASONAL DECOMP: ETS(M,A,N) Test   12.6  14.5  2.13  3.45   7.60e-2
 7         7 SEASONAL DECOMP: REGRESSIO~ Test   13.3  12.7  1.94  2.43   3.68e-2
 8         8 ARIMA(5,1,3)(1,0,1)[7] W/ ~ Test   13.5  12.9  1.97  2.49   9.83e-4
 9         9 ETS(M,AD,M)                 Test   13.2  12.7  1.94  2.42   1.36e-3
10        10 CROSTON METHOD              Test   13.5  12.9  1.96  2.44  NA      
11        11 THETA METHOD                Test   12.0  13.6  2.01  3.25   7.60e-2
12        12 NAIVE                       Test   13.4  12.8  1.95  2.43  NA      
13        13 SNAIVE [7]                  Test   13.4  12.8  1.95  2.44   2.93e-6
14        14 NNAR(1,1,10)[5]             Test   82.6  47.9 12.1  16.5    5.55e-2

Veremos o modelo prophet p. ex.:

list(model_fit_prophet_xb) %>%
    as_modeltime_table()
# Modeltime Table
# A tibble: 1 x 3
  .model_id .model   .model_desc
      <int> <list>   <chr>      
1         1 <fit[+]> PROPHET    

Calibração dos modelos (bivariado)

calibration_tbl_bivariada <- models_tbl %>%
  modeltime_calibrate(new_data = testing(splits), quiet = FALSE)

calibration_tbl_bivariada
# Modeltime Table
# A tibble: 14 x 5
   .model_id .model   .model_desc                   .type .calibration_data 
       <int> <list>   <chr>                         <chr> <list>            
 1         1 <fit[+]> ARIMA(2,1,0)                  Test  <tibble [556 x 4]>
 2         2 <fit[+]> ARIMA(0,1,2)                  Test  <tibble [556 x 4]>
 3         3 <fit[+]> ETS(M,A,N)                    Test  <tibble [556 x 4]>
 4         4 <fit[+]> BATS(0, {0,0}, -, -)          Test  <tibble [556 x 4]>
 5         5 <fit[+]> PROPHET                       Test  <tibble [556 x 4]>
 6         6 <fit[+]> SEASONAL DECOMP: ETS(M,A,N)   Test  <tibble [556 x 4]>
 7         7 <fit[+]> SEASONAL DECOMP: ARIMA(0,1,0) Test  <tibble [556 x 4]>
 8         8 <fit[+]> ARIMA(5,1,3)(1,0,1)[7]        Test  <tibble [556 x 4]>
 9         9 <fit[+]> ETS(M,AD,M)                   Test  <tibble [556 x 4]>
10        10 <fit[+]> CROSTON METHOD                Test  <tibble [556 x 4]>
11        11 <fit[+]> THETA METHOD                  Test  <tibble [556 x 4]>
12        12 <fit[+]> NAIVE                         Test  <tibble [556 x 4]>
13        13 <fit[+]> SNAIVE [7]                    Test  <tibble [556 x 4]>
14        14 <fit[+]> NNAR(1,1,10)[5]               Test  <tibble [556 x 4]>

Projeção (forecasting) (bivariado)

calibration_tbl_bivariada %>%
    modeltime_forecast(
        new_data    = testing(splits),
        actual_data = CORN
    ) %>%
    plot_modeltime_forecast(
      .legend_max_width = 25, # For mobile screens
      .interactive      = TRUE
    )

Modelo combinado de projeção (bivariado)

O modelo do tipo ensemble é um modelo misto com os pesos ajustados pela acurácia de cada um dos 14 univariados estimados até aqui

# Modelo 15: ensemble  ----
model_fit_ensemble_bivariada <- models_tbl_bivariada %>%
  ensemble_weighted(
    loadings = c
    (
      
#-- Peso --       # -- Modelo --
      
     1,           # Modelo 1
     10,          # Modelo 2
     5,           # Modelo 3
     6,           # Modelo 4
     10,          # Modelo 5
     9,           # Modelo 6
     8,           # Modelo 7 
     10,          # Modelo 8
     7,           # Modelo 9
     2,           # Modelo 10  
     2,           # Modelo 11 
     2,           # Modelo 12 
     1,           # Modelo 13 
     6            # Modelo 14 
      
      ),
    scale_loadings = TRUE
  )

model_fit_ensemble_bivariada
-- Modeltime Ensemble -------------------------------------------
Ensemble of 14 Models (WEIGHTED)

# Modeltime Table
# A tibble: 14 x 4
   .model_id .model   .model_desc                                      .loadings
       <int> <list>   <chr>                                                <dbl>
 1         1 <fit[+]> REGRESSION WITH ARIMA(2,1,0) ERRORS                 0.0127
 2         2 <fit[+]> ARIMA(0,1,2) W/ XGBOOST ERRORS                      0.127 
 3         3 <fit[+]> ETS(M,A,N)                                          0.0633
 4         4 <fit[+]> BATS(0, {0,0}, -, -)                                0.0759
 5         5 <fit[+]> PROPHET                                             0.127 
 6         6 <fit[+]> SEASONAL DECOMP: ETS(M,A,N)                         0.114 
 7         7 <fit[+]> SEASONAL DECOMP: REGRESSION WITH ARIMA(0,1,0) E~    0.101 
 8         8 <fit[+]> ARIMA(5,1,3)(1,0,1)[7] W/ XGBOOST ERRORS            0.127 
 9         9 <fit[+]> ETS(M,AD,M)                                         0.0886
10        10 <fit[+]> CROSTON METHOD                                      0.0253
11        11 <fit[+]> THETA METHOD                                        0.0253
12        12 <fit[+]> NAIVE                                               0.0253
13        13 <fit[+]> SNAIVE [7]                                          0.0127
14        14 <fit[+]> NNAR(1,1,10)[5]                                     0.0759

Construimos a projeção com o modelo misto:

# Calibragem

calibration_tbl_bivariada <- modeltime_table(
    model_fit_ensemble_bivariada
) %>%
    modeltime_calibrate(testing(splits), quiet = FALSE)

# Forecast vs amostra de teste (dados originais pra frente antes do corte pra amostra treino)

calibration_tbl_bivariada %>%
    modeltime_forecast(
        new_data    = testing(splits),
        actual_data = CORN
    ) %>%
    plot_modeltime_forecast(.interactive = TRUE)

Reamostragens (bivariado)

A reamostragem de séries temporais é uma estratégia importante para avaliar a estabilidade dos modelos ao longo do tempo. No entanto, é um tanto quanto penoso fazer isso, pois requer vários loops for para gerar as previsões para vários modelos e, potencialmente, vários grupos de séries temporais.

resamples_tscv_bivariada <- training(splits) %>%
    time_series_cv(
                  assess = "7 years",
                  initial = "1 years",
                  skip = "1 years",
# Normally we do more than one slice, but this speeds up the example
slice_limit = 1
)
# Fit and generate resample predictions
model_fit_ensemble_resample_bivariada <- calibration_tbl_bivariada %>%
modeltime_fit_resamples(
            resamples = resamples_tscv,
            control = control_resamples(verbose = TRUE)
)
-- Fitting Resamples --------------------------------------------

0.01 sec elapsed
# A new data frame is created from the Modeltime Table
# with a column labeled, '.resample_results'
model_fit_ensemble_resample_bivariada
# Modeltime Table
# A tibble: 1 x 6
  .model_id .model    .model_desc        .type .calibration_da~ .resample_resul~
      <int> <list>    <chr>              <chr> <list>           <list>          
1         1 <ensembl~ ENSEMBLE (WEIGHTE~ Test  <tibble [556 x ~ <lgl [1]>       

Plota o gráfico

model_fit_ensemble_resample %>%
  plot_modeltime_resamples(.interactive = TRUE)

Revisão das projeções (forecast forward) (bivariado)

Constrói a projeção para fora da amostra desde a última observação para os próximos 24 meses:

refit_tbl_bivariada <- calibration_tbl_bivariada %>%
    modeltime_refit(data = CORN)

refit_tbl_bivariada %>%
    modeltime_forecast(h = "2 years", actual_data = CORN) %>%
    plot_modeltime_forecast(
      .legend_max_width = 25, # For mobile screens
      .interactive      = TRUE
    )

[continuar escrevendo…]

 


Referências

Business Science in R bloggers Introducing Modeltime Recursive: Tidy Autoregressive Forecasting with Lags, Disponível em: https://www.r-bloggers.com/2021/04/introducing-modeltime-recursive-tidy-autoregressive-forecasting-with-lags/

Business Science in R bloggers Introducing Modeltime: Tidy Time Series Forecasting using Tidymodels, Disponível em: https://www.r-bloggers.com/2020/06/introducing-modeltime-tidy-time-series-forecasting-using-tidymodels/

Business Science, Github, Ensemble Algorithms for Time Series Forecasting with Modeltime, Disponível em: https://business-science.github.io/modeltime.ensemble/ acesso em junho de 2021.

CRAN, Getting Started with Modeltime, Disponível em: https://cran.r-project.org/web/packages/modeltime/vignettes/getting-started-with-modeltime.html

CRAN, Package ‘modeltime’, Disponível em: https://cran.r-project.org/web/packages/modeltime/modeltime.pdf junho de 2021.

CRAN, Package ‘modeltime.ensemble’ Disponível em: https://cran.r-project.org/web/packages/modeltime.ensemble/modeltime.ensemble.pdf junho de 2021.

CRAN, Package ‘modeltime.resample’ Disponível em: https://cran.r-project.org/web/packages/modeltime.resample/modeltime.resample.pdf junho de 2021.

Github, Ensemble Algorithms for Time Series Forecasting with Modeltime, Disponível em: https://github.com/business-science/modeltime.ensemble acesso em junho de 2021.

HANSEN, B. Econometrics, University of Wisconsin, Department of Economics. Disponível em: https://www.ssc.wisc.edu/~bhansen/econometrics/Econometrics.pdf March 11, 2021.

USAI, D. Time Series Machine Learning Analysis and Demand Forecasting with H2O & TSstudio, Disponível em: https://rpubs.com/DiegoUsai/565288